←︎ Back Business Forecasting › Class Slides
1 / 21

Exponential Smoothing

Lecture 7  ·  ETS Models

How should recent observations be weighted relative to older ones?

Simple exponential smoothing gives more weight to recent observations.
The SES forecast is a weighted average of all past observations, with weights that decay exponentially:
ŷT+1|T = αyT + α(1−α)yT−1 + α(1−α)²yT−2 + …
α is the smoothing parameter, 0 < α ≤ 1.
  • α near 1: almost all weight on the most recent observation — the forecast tracks the data closely but is noisy.
  • α near 0: weight is spread broadly across history — the forecast is very smooth but slow to react to change.
α is estimated from the data by minimizing the sum of squared one-step forecast errors.
SES is equivalently expressed as a state-space model with a level component.
Forecast equation:
ŷt+h|t = ℓt
Level equation:
t = αyt + (1−α)ℓt−1
The level t is a weighted average of the current observation and the previous level estimate. All future forecasts equal the current level — SES produces a flat forecast.
SES is optimal when the series has no trend and no seasonality — it is equivalent to an ARIMA(0,1,1) model.

How does exponential smoothing handle a trending series?

Holt’s linear trend method adds a trend component to SES.
Forecast equation: ŷt+h|t = ℓt + hbt
Level equation: t = αyt + (1−α)(ℓt−1 + bt−1)
Trend equation: bt = β*(ℓt − ℓt−1) + (1−β*)bt−1
Two smoothing parameters: α (level) and β* (trend), both in (0, 1].
Problem: Holt’s method extrapolates the trend indefinitely into the future, which is often unrealistic at long horizons. Trends rarely continue unchanged forever.
The damped trend method prevents over-extrapolation.
A damping parameter φ (0 < φ ≤ 1) multiplies the trend at each step, causing it to shrink toward zero as the horizon increases:
ŷt+h|t = ℓt + (φ + φ² + … + φh)bt
  • φ = 1: identical to Holt’s (no damping).
  • φ near 0: the trend is heavily damped; forecasts quickly flatten.
  • φ = 0.88–0.98: typical estimated values; the trend fades slowly but persistently.
The damped trend method is one of the most accurate and robust methods across a wide variety of series — it is the recommended default when a trend is present.

How is seasonality incorporated into exponential smoothing?

The Holt-Winters method adds a seasonal component.
Additive Holt-Winters (for constant seasonal amplitude):
ŷt+h|t = ℓt + hbt + st+h−m(k+1)
t = α(yt − st−m) + (1−α)(ℓt−1 + bt−1)
bt = β*(ℓt − ℓt−1) + (1−β*)bt−1
st = γ(yt − ℓt−1 − bt−1) + (1−γ)st−m
Three smoothing parameters: α (level), β* (trend), γ (seasonal), all in [0, 1]. m is the seasonal period; k = ⌊(h−1)/m⌋.
Multiplicative seasonality works when the seasonal amplitude grows with the level.
Multiplicative Holt-Winters — the seasonal component multiplies (rather than adds to) the level:
ŷt+h|t = (ℓt + hbt) · st+h−m(k+1)
The seasonal indices now represent multipliers (e.g., 1.25 for a month 25% above average) rather than additive deviations.
Choosing between them: if seasonal swings are roughly the same size every year, use additive. If they grow proportionally with the series level, use multiplicative (or log-transform and use additive).

What unifies all these methods into a single framework?

ETS stands for Error, Trend, Seasonal.
The ETS framework (Hyndman et al., 2002) expresses every exponential smoothing method as a statistical state-space model with three components, each taking one of several forms:
Error (E): Additive (A) or Multiplicative (M).
Trend (T): None (N), Additive (A), Additive Damped (Ad).
Seasonal (S): None (N), Additive (A), Multiplicative (M).
ETS(A,N,N) = SES  ·  ETS(A,A,N) = Holt’s  ·  ETS(A,Ad,N) = Damped trend  ·  ETS(A,A,A) = Holt-Winters additive.
The ETS notation encodes model structure compactly.
Trend Error: A   Seasonal: N Error: A   Seasonal: A Error: A   Seasonal: M
None (N) A,N,N — SES A,N,A A,N,M
Additive (A) A,A,N — Holt A,A,A — HW add. A,A,M — HW mult.
Damped (Ad) A,Ad,N — Damped A,Ad,A A,Ad,M
Highlighted cells are the most commonly used models. There are 30 possible ETS specifications in total (including M-error variants). ETS(y) in fpp3 selects the best-fitting one automatically via AICc.
ETS selects the best model using AICc.
ETS(y) fits all 30 candidate models and returns the one with the lowest AICc — the information criterion that penalizes complexity for small samples.
You can also constrain the search:
# Let fpp3 choose automatically
model(ETS(y))

# Force a specific model
model(ETS(y ~ error("A") + trend("Ad") + season("M")))
Important: AICc chooses the best in-sample fit penalized for complexity. Always confirm the selection makes sense with a residual diagnostic (gg_tsresiduals()) and compare out-of-sample accuracy with benchmarks.
The state-space formulation provides exact prediction intervals.
The ETS state-space model has two equations:
Measurement equation: yt = h(xt−1) + k(xt−1t
State equation: xt = f(xt−1) + g(xt−1t
xt is the state vector (level, trend, seasonal). εt is i.i.d. with mean zero.
Additive-error models give normally distributed prediction intervals. Multiplicative-error models require simulation (bootstrap) for exact intervals but are often better for series that cannot be negative.

When does ETS outperform regression?

ETS requires no external predictors.
  • When you don’t have reliable predictors available at the forecast horizon, ETS needs only the history of yt itself.
ETS adapts to changing levels and trends automatically.
  • A regression model with a fixed trend line cannot adapt if the growth rate shifts mid-series. ETS updates the level and trend at every period.
ETS excels at short-to-medium horizon forecasts for business series.
  • Retail sales, energy demand, inventory — series that have clear level, trend, and seasonal patterns but no obvious external drivers.
Use regression (or dynamic regression) when:
  • You have reliable predictors that improve accuracy beyond what the past series alone can offer.

ETS in fpp3: a complete workflow

Fit and auto-select:
  • fit <- data |> model(ETS(y))
  • report(fit)     # shows selected model and parameters
Diagnose residuals:
  • gg_tsresiduals(fit)   # ACF, histogram, time plot of residuals
Forecast and plot:
  • fc <- fit |> forecast(h = 24)
  • fc |> autoplot(data)   # with 80% and 95% PI shading
Evaluate accuracy:
  • accuracy(fc, test_data) |> select(MASE, RMSE)

Chapter 8 in summary

Exponential smoothing weights recent observations more heavily than old ones.
  • The smoothing parameters (α, β*, γ, φ) are estimated by minimizing squared errors.
The ETS framework unifies all variants as state-space models.
  • Error (A or M), Trend (N, A, Ad), Seasonal (N, A, M) — 30 possible combinations.
AICc selects the best ETS model automatically.
  • Always verify with residual diagnostics and out-of-sample accuracy.
Damped trend is the single best default for trended series.
  • ETS(A,Ad,N) or ETS(A,Ad,M) depending on seasonality.
Practice Questions
Question 1 of 4

Key Terms

ETS(A,N,N) is identical to ARIMA(0,1,1).
This equivalence shows that exponential smoothing and ARIMA models are not competing philosophies — they are closely related families. Specifically:
  • SES = ETS(A,N,N) = ARIMA(0,1,1) with θ1 = α − 1.
  • Holt’s linear method = ETS(A,A,N) = ARIMA(0,2,2).
  • Damped trend = ETS(A,Ad,N) = ARIMA(1,1,2).
The key practical difference: ARIMA models can capture more complex autocorrelation structures (including moving average components and long-memory), while ETS focuses on a structured decomposition of level, trend, and seasonality. In practice, run both and compare on test data.